13 research outputs found

    Computing Bits of Algebraic Numbers

    Full text link
    We initiate the complexity theoretic study of the problem of computing the bits of (real) algebraic numbers. This extends the work of Yap on computing the bits of transcendental numbers like \pi, in Logspace. Our main result is that computing a bit of a fixed real algebraic number is in C=NC1\subseteq Logspace when the bit position has a verbose (unary) representation and in the counting hierarchy when it has a succinct (binary) representation. Our tools are drawn from elementary analysis and numerical analysis, and include the Newton-Raphson method. The proof of our main result is entirely elementary, preferring to use the elementary Liouville's theorem over the much deeper Roth's theorem for algebraic numbers. We leave the possibility of proving non-trivial lower bounds for the problem of computing the bits of an algebraic number given the bit position in binary, as our main open question. In this direction we show very limited progress by proving a lower bound for rationals

    Testing Uniformity of Stationary Distribution

    Full text link
    A random walk on a directed graph gives a Markov chain on the vertices of the graph. An important question that arises often in the context of Markov chain is whether the uniform distribution on the vertices of the graph is a stationary distribution of the Markov chain. Stationary distribution of a Markov chain is a global property of the graph. In this paper, we prove that for a regular directed graph whether the uniform distribution on the vertices of the graph is a stationary distribution, depends on a local property of the graph, namely if (u,v) is an directed edge then outdegree(u) is equal to indegree(v). This result also has an application to the problem of testing whether a given distribution is uniform or "far" from being uniform. This is a well studied problem in property testing and statistics. If the distribution is the stationary distribution of the lazy random walk on a directed graph and the graph is given as an input, then how many bits of the input graph do one need to query in order to decide whether the distribution is uniform or "far" from it? This is a problem of graph property testing and we consider this problem in the orientation model (introduced by Halevy et al.). We reduce this problem to test (in the orientation model) whether a directed graph is Eulerian. And using result of Fischer et al. on query complexity of testing (in the orientation model) whether a graph is Eulerian, we obtain bounds on the query complexity for testing whether the stationary distribution is uniform

    Efficient Compression Technique for Sparse Sets

    Full text link
    Recent technological advancements have led to the generation of huge amounts of data over the web, such as text, image, audio and video. Most of this data is high dimensional and sparse, for e.g., the bag-of-words representation used for representing text. Often, an efficient search for similar data points needs to be performed in many applications like clustering, nearest neighbour search, ranking and indexing. Even though there have been significant increases in computational power, a simple brute-force similarity-search on such datasets is inefficient and at times impossible. Thus, it is desirable to get a compressed representation which preserves the similarity between data points. In this work, we consider the data points as sets and use Jaccard similarity as the similarity measure. Compression techniques are generally evaluated on the following parameters --1) Randomness required for compression, 2) Time required for compression, 3) Dimension of the data after compression, and 4) Space required to store the compressed data. Ideally, the compressed representation of the data should be such, that the similarity between each pair of data points is preserved, while keeping the time and the randomness required for compression as low as possible. We show that the compression technique suggested by Pratap and Kulkarni also works well for Jaccard similarity. We present a theoretical proof of the same and complement it with rigorous experimentations on synthetic as well as real-world datasets. We also compare our results with the state-of-the-art "min-wise independent permutation", and show that our compression algorithm achieves almost equal accuracy while significantly reducing the compression time and the randomness

    Improved Outlier Robust Seeding for k-means

    Full text link
    The kk-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called kk-means++ uses D2D^{2} sampling and comes with a provable O(logk)O(\log k) approximation guarantee \cite{AV2007}. However, in the presence of adversarial noise or outliers, D2D^{2} sampling is more likely to pick centers from distant outliers instead of inlier clusters, and therefore its approximation guarantees \textit{w.r.t.} kk-means solution on inliers, does not hold. Assuming that the outliers constitute a constant fraction of the given data, we propose a simple variant in the D2D^2 sampling distribution, which makes it robust to the outliers. Our algorithm runs in O(ndk)O(ndk) time, outputs O(k)O(k) clusters, discards marginally more points than the optimal number of outliers, and comes with a provable O(1)O(1) approximation guarantee. Our algorithm can also be modified to output exactly kk clusters instead of O(k)O(k) clusters, while keeping its running time linear in nn and dd. This is an improvement over previous results for robust kk-means based on LP relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and \textit{robust kk-means++} \cite{DeshpandeKP20}. Our empirical results show the advantage of our algorithm over kk-means++~\cite{AV2007}, uniform random seeding, greedy sampling for kk means~\cite{tkmeanspp}, and robust kk-means++~\cite{DeshpandeKP20}, on standard real-world and synthetic data sets used in previous work. Our proposal is easily amenable to scalable, faster, parallel implementations of kk-means++ \cite{Bahmani,BachemL017} and is of independent interest for coreset constructions in the presence of outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}

    Minwise-Independent Permutations with Insertion and Deletion of Features

    Full text link
    In their seminal work, Broder \textit{et. al.}~\citep{BroderCFM98} introduces the minHash\mathrm{minHash} algorithm that computes a low-dimensional sketch of high-dimensional binary data that closely approximates pairwise Jaccard similarity. Since its invention, minHash\mathrm{minHash} has been commonly used by practitioners in various big data applications. Further, the data is dynamic in many real-life scenarios, and their feature sets evolve over time. We consider the case when features are dynamically inserted and deleted in the dataset. We note that a naive solution to this problem is to repeatedly recompute minHash\mathrm{minHash} with respect to the updated dimension. However, this is an expensive task as it requires generating fresh random permutations. To the best of our knowledge, no systematic study of minHash\mathrm{minHash} is recorded in the context of dynamic insertion and deletion of features. In this work, we initiate this study and suggest algorithms that make the minHash\mathrm{minHash} sketches adaptable to the dynamic insertion and deletion of features. We show a rigorous theoretical analysis of our algorithms and complement it with extensive experiments on several real-world datasets. Empirically we observe a significant speed-up in the running time while simultaneously offering comparable performance with respect to running minHash\mathrm{minHash} from scratch. Our proposal is efficient, accurate, and easy to implement in practice

    Efficient Sketching Algorithm for Sparse Binary Data

    Full text link
    Recent advancement of the WWW, IOT, social network, e-commerce, etc. have generated a large volume of data. These datasets are mostly represented by high dimensional and sparse datasets. Many fundamental subroutines of common data analytic tasks such as clustering, classification, ranking, nearest neighbour search, etc. scale poorly with the dimension of the dataset. In this work, we address this problem and propose a sketching (alternatively, dimensionality reduction) algorithm -- \binsketch (Binary Data Sketch) -- for sparse binary datasets. \binsketch preserves the binary version of the dataset after sketching and maintains estimates for multiple similarity measures such as Jaccard, Cosine, Inner-Product similarities, and Hamming distance, on the same sketch. We present a theoretical analysis of our algorithm and complement it with extensive experimentation on several real-world datasets. We compare the performance of our algorithm with the state-of-the-art algorithms on the task of mean-square-error and ranking. Our proposed algorithm offers a comparable accuracy while suggesting a significant speedup in the dimensionality reduction time, with respect to the other candidate algorithms. Our proposal is simple, easy to implement, and therefore can be adopted in practice

    One-Pass Additive-Error Subset Selection for ?_p Subspace Approximation

    Get PDF
    We consider the problem of subset selection for ?_p subspace approximation, that is, to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. Previously known subset selection algorithms based on volume sampling and adaptive sampling [Deshpande and Varadarajan, 2007], for the general case of p ? [1, ?), require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ?_p subspace approximation, for any p ? [1, ?). Earlier subset selection algorithms that give a one-pass multiplicative (1+?) approximation work under the special cases. Cohen et al. [Michael B. Cohen et al., 2017] gives a one-pass subset section that offers multiplicative (1+?) approximation guarantee for the special case of ?? subspace approximation. Mahabadi et al. [Sepideh Mahabadi et al., 2020] gives a one-pass noisy subset selection with (1+?) approximation guarantee for ?_p subspace approximation when p ? {1, 2}. Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p ? [1, ?)

    Optical coherence tomography and subclinical optical neuritis in longitudinally extensive transverse myelitis

    No full text
    Objective: The aim is to compare the retinal nerve fiber layer (RNFL) thickness of longitudinally extensive transverse myelitis (LETM) eyes without previous optic neuritis with that of healthy control subjects. Methods: Over 20 LETM eyes and 20 normal control eyes were included in the study and subjected to optical coherence tomography to evaluate and compare the RNFL thickness. Result: Significant RNFL thinning was observed at 8 o'clock position in LETM eyes as compared to the control eyes (P = 0.038). No significant differences were seen in other RNFL measurements. Conclusion: Even in the absence of previous optic neuritis LETM can lead to subclinical axonal damage leading to focal RNFL thinning
    corecore